Transliterated Search using Syllabification Approach
نویسندگان
چکیده
Machine transliteration refers to the process of automatic conversion of a word from one language to another without losing its phonological characteristics. In this work, we present our experiments performed in subtask-1 and subtask-2 as a part of the FIRE-2013 transliterated search task. In both the subtasks, the transliteration from Roman script to Devanagari script was performed using syllabification approach that converted English into Hindi language. In the query labeling subtask, identification of English and Hindi words was performed using a hybrid approach that involved morphological analysis of English words and a corpus based approach to identify frequently occurring Hindi words. In the multi-script adhoc retrieval of Hindi song lyrics subtask, the queries were formulated that contained both Roman and Devanagari script and Roman script for separate run submissions. The evaluation of our experiments achieved a higher recall value of query labeling in subtask-1 however the results of subtask-2 are indicating average performance.
منابع مشابه
LIGA and Syllabification Approach for Language Identification and Back Transliteration : Shared Task Report by DAIICT
This paper aims to address the solution for the Subtask 1 of Shared Task on transliterated search,a task in FIRE ’14. The task addresses the problem of data containing English words and transliterated words of Indian languages in English.The task calls for language identification and subsequent back transliteration into the native Indian scripts.The system proposed herewith implements Language ...
متن کاملConstructing Transliteration Lexicons from Web Corpora
This paper proposes a novel approach to automating the construction of transliterated-term lexicons. A simple syllable alignment algorithm is used to construct confusion matrices for cross-language syllable-phoneme conversion. Each row in the confusion matrix consists of a set of syllables in the source language that are (correctly or erroneously) matched phonetically and statistically to a syl...
متن کاملA Relevance feedback based approach for mixed script transliterated text search: Shared Task report by BIT Mesra, India
This paper describes the experiments carried out as part of the participation in FIRE-2014 Transliterated Search Shared task. We participated in subtask-2 and submitted two results generated by systems based on relevant feedback approach. Given a collection of documents in mixed script, the task is to retrieve relevant documents using queries in either script. The spelling variation between dif...
متن کاملEncoding transliteration variation through dimensionality reduction: FIRE Shared Task on Transliterated Search
There exist a large amount of user generated Web content in Roman script for the languages which are written in indigenous scripts for various reasons. In the light of this phenomenon, the search engines face a non-trivial problem of matching queries and documents in transliterated space where transliterated content contain extensive spelling variation. This paper describes our proposed method ...
متن کاملGenerating Paired Transliterated-cognates Using Multiple Pronunciation Characteristics from Web corpora
A novel approach to automatically extracting paired transliterated-cognates from Web corpora is proposed in this paper. One of the most important issues addressed is that of taking multiple pronunciation characteristics into account. Terms from various languages may pronounce very differently. Incorporating the knowledge of word origin may improve the pronunciation accuracy of terms. The accura...
متن کامل